Skip to content

Conversation

@ronakrm
Copy link

@ronakrm ronakrm commented Nov 6, 2025

Summary

This PR adds prompt caching support for Anthropic models, allowing users to cache parts of prompts (system prompts, long context, tools) to reduce costs by ~90% for cached tokens.

This is a simplified, Anthropic-only implementation based on the work in #2560, following the maintainer's suggestion to "launch this for just Anthropic first."

Core Implementation

  • Added CachePoint class: Simple marker that can be inserted into user prompts to indicate cache boundaries
  • Implemented cache control in AnthropicModel: Uses BetaCacheControlEphemeralParam to add cache_control to content blocks
  • Added cache metrics mapping: Automatically tracks cache_write_tokens and cache_read_tokens via genai-prices
  • CachePoint is passed through for all other models (ignored)

Example Usage

from pydantic_ai import Agent, CachePoint

agent = Agent('anthropic:claude-sonnet-4-5')

result = await agent.run([
      LONG_CONTEXT,      # Long documentation or context
      CachePoint(),      # Mark cache boundary - everything before will be cached
      'Your question here'
  ])

# First request: cache_write_tokens > 0 (writes to cache)
# Subsequent requests: cache_read_tokens > 0 (reads from cache with 90% discount)

Testing

  • Basic cache control application
  • Multiple cache points in single prompt
  • Error handling (CachePoint as first content)
  • Different content types (images)
  • Confirmed working with actual Anthropic API calls showing proper cache metrics (can see in Anthropic/Claude console)

Compatibility

  • Added CachePoint filtering in other model providers (e.g., OpenAI) for graceful degradation
  • Models that don't support caching simply filter out CachePoint markers

Real-World Test Results

Tested with live Anthropic API:
Request 1 (cache write): cache_write_tokens=3264
Request 2 (cache read): cache_read_tokens=3264
Request 3 (cache read): cache_read_tokens=3264
Total savings: ~5875 token-equivalents

I likely can create a stacking PR to push system prompt caching for Anthropic as well (this needs to update _map_message and related code to just always have a list of blocks, and user-based string system prompts should probably just be detected and mapped into the json format).

@ronakrm ronakrm force-pushed the anthropic-prompt-caching-only branch 2 times, most recently from 791999d to 5b5cb9f Compare November 7, 2025 04:26
@DouweM DouweM self-assigned this Nov 7, 2025
"""Add cache control to the last content block param."""
if not params:
raise UserError(
'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying in context from https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached:

Tools: Tool definitions in the tools array
System messages: Content blocks in the system array
Text messages: Content blocks in the messages.content array, for both user and assistant turns
Images & Documents: Content blocks in the messages.content array, in user turns
Tool use and tool results: Content blocks in the messages.content array, in both user and assistant turns

I think we should support inserting a cache point after tool defs and system messages as well.

In the original PR I suggested doing this by supporting CachePoint as the first content in a user message (by adding it to whatever came before it: the system message, tool definition, or the last message of the assistant output), but that doesn't really feel natural from a code perspective.

What do you think about adding anthropic_cache_tools and anthropic_cache_instructions fields to AnthropicModelSettings, and setting cache_control on the relevant parts when set?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, I'll look into it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the message here to make it clear that they are likely looking for one of the 2 settings instead.

@DouweM
Copy link
Collaborator

DouweM commented Nov 7, 2025

@ronakrm If you're up for it, I'd welcome Bedrock support in this PR as well. It'll have that one bug (#2560 (comment)) but most users won't hit it, and it's clearly on their side to fix, not ours. Initially I thought we should hold off until they'd fixed it, but I'd rather just get this out for most people who won't hit the issue anyway.

@ronakrm
Copy link
Author

ronakrm commented Nov 8, 2025

@ronakrm If you're up for it, I'd welcome Bedrock support in this PR as well. It'll have that one bug (#2560 (comment)) but most users won't hit it, and it's clearly on their side to fix, not ours. Initially I thought we should hold off until they'd fixed it, but I'd rather just get this out for most people who won't hit the issue anyway.

I can take a stab at this, but was a bit concerned about scope-creep causing me to get less excited and delay work on this, and my current inability to test a live Bedrock example. I may first get a full pass on the pure-Anthropic side if that's alright with you.

(Also not sure what you're timelines are for this, but I should be able to make another pass at this in the next few days)

@DouweM
Copy link
Collaborator

DouweM commented Nov 10, 2025

I can take a stab at this, but was a bit concerned about scope-creep causing me to get less excited and delay work on this, and my current inability to test a live Bedrock example. I may first get a full pass on the pure-Anthropic side if that's alright with you.

Sounds good! I can then do Bedrock in a follow up PR.

(Also not sure what you're timelines are for this, but I should be able to make another pass at this in the next few days)

That's great, thanks.

drogo-admin

This comment was marked as spam.

@ronakrm ronakrm force-pushed the anthropic-prompt-caching-only branch 2 times, most recently from b1f6d6c to 7ef071f Compare November 12, 2025 21:54
ronakrm and others added 11 commits November 12, 2025 14:12
This implementation adds prompt caching support for Anthropic models,
allowing users to cache parts of prompts (system prompts, long context,
tools) to reduce costs by ~90% for cached tokens.

Key changes:
- Add CachePoint class to mark cache boundaries in prompts
- Implement cache control in AnthropicModel using BetaCacheControlEphemeralParam
- Add cache metrics mapping (cache_creation_input_tokens → cache_write_tokens)
- Add comprehensive tests for CachePoint functionality
- Add working example demonstrating prompt caching usage
- Add CachePoint filtering in OpenAI models for compatibility

The implementation is Anthropic-only (removed Bedrock complexity from
original PR pydantic#2560) for a cleaner, more maintainable solution.

Related to pydantic#2560

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Fix TypedDict mutation in anthropic.py using cast()
- Handle CachePoint in otel message conversion (skip for telemetry)
- Add CachePoint handling in all model providers for compatibility
- Models without caching support (Bedrock, Gemini, Google, HuggingFace, OpenAI) now filter out CachePoint markers

All pyright type checks now pass.
Adding CachePoint handling pushed method complexity over the limit (16 > 15).
Added noqa: C901 to suppress the complexity warning.
- Add test_cache_point_in_otel_message_parts to cover CachePoint in otel conversion
- Add test_cache_control_unsupported_param_type to cover unsupported param error
- Use .get() for TypedDict access to avoid type checking errors
- Add type: ignore for testing protected method
- Restore pragma: lax no cover on google.py file_data handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add test_cache_point_filtering for OpenAI, Bedrock, Google, and Hugging Face
- Tests verify CachePoint is filtered out without errors
- Achieves 100% coverage for CachePoint code paths

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This commit addresses maintainer feedback on the Anthropic prompt caching PR:

- Add anthropic_cache_tools field to cache last tool definition
- Add anthropic_cache_instructions field to cache system prompts
- Rewrite existing CachePoint tests to use snapshot() assertions
- Add comprehensive tests for new caching settings
- Remove standalone example file, add docs section instead
- Move imports to top of test files
- Remove ineffective Google CachePoint test
- Add "Supported by: Anthropic" to CachePoint docstring
- Add Anthropic docs link in cache_control method

Tests are written but snapshots not yet generated (will be done in next commit).
@ronakrm ronakrm force-pushed the anthropic-prompt-caching-only branch from 7ef071f to 57d051a Compare November 12, 2025 22:18
@ronakrm
Copy link
Author

ronakrm commented Nov 13, 2025

So I think the current Python 3.11 + lowest-direct test failure is unrelated to this PR.

The Python 3.11 (lowest-direct) CI failure is not caused by the CachePoint feature but maybe a pre-existing/old Python 3.11 issue.

The test test_openai_responses.py crashes with Fatal Python error: Illegal instruction only on Python 3.11 with --resolution lowest-direct. This test passes on main branch but fails on this PR branch.

Via CC debugging, it seems like:

  1. This PR adds CachePoint to the UserContent TypeAlias, expanding it from 6 to 7 types
  2. This changes Python 3.11's module import timing due to PEP 659 optimizations
  3. The timing change causes numpy/matplotlib C extensions to load at a slightly different moment
  4. The oldest compatible versions (from lowest-direct) contain CPU instruction incompatibilities that trigger "illegal instruction" errors

I don't think the pure Python changes here can cause "illegal instruction" errors, this is probably always a C extension issue? The crash seems timing-dependent: main branch imports work, but this branch's slightly different timing exposes the C extension bug (affecting the parallel test exec via pytest-xdist worker processes)

Unfortunately I can't reproduce this locally because I don't have CUDA, and --resolution lowest-direct tries to build vllm 0.1.3 which requires CUDA:

RuntimeError: Cannot find CUDA_HOME. CUDA must be available in order to build the package.

Claude recommends:

  1. Skip Python 3.11 lowest-direct as a known CI limitation
  2. Pin minimum versions for numpy/matplotlib on Python 3.11 (Probably have to test to confirm this would resolve the issue)
  3. Merge as-is since all other test configurations pass and the code is correct

@DouweM Any thoughts or recommendations?

ronakrm and others added 3 commits November 12, 2025 23:10
- Add test_cache_point_with_streaming to verify CachePoint works with run_stream()
- Add test_cache_point_with_unsupported_type to verify error handling for non-cacheable content types
- Add test_cache_point_in_user_prompt to verify CachePoint is filtered in OpenTelemetry conversion
- Fix test_cache_point_filtering in test_google.py to properly test _map_user_prompt method
- Enhance test_cache_point_filtering in test_openai.py to directly test both Chat and Responses models
- Add test_cache_point_filtering_responses_model for OpenAI Responses API

These tests increase diff coverage from 68% to 98% (100% for all production code).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Move CachePoint imports to top of test files (test_bedrock.py, test_huggingface.py)
- Add documentation link for cacheable_types in anthropic.py

Addresses feedback from @DouweM in PR pydantic#3363

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@DouweM
Copy link
Collaborator

DouweM commented Nov 13, 2025

  • Skip Python 3.11 lowest-direct as a known CI limitation

@ronakrm Done! 1df9ca6

Is this ready for review again or did you still have some changes planned?

@ronakrm ronakrm marked this pull request as ready for review November 13, 2025 21:48
- Add explicit list[ModelMessage] type annotations in test_instrumented.py
- Fix pyright ignore comment placement in test_openai.py
- Remove unnecessary type ignore comments

Fixes CI pyright errors reported on Python 3.10

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

# Add cache_control to the last tool if enabled
if tools and model_settings.get('anthropic_cache_tools'):
last_tool = cast(dict[str, Any], tools[-1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to cast it? I'd rather change the type of BetaToolUnionParam to not be a union, so that we can be sure it's a (typed)dict here.

Copy link
Author

@ronakrm ronakrm Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BetaToolUnionParam is an upstream anthropic package type, I think this is the best we can do for now?

(This cast was unneeded, but the one at ~L700 is, comment added)

"""Add cache control to the last content block param."""
if not params:
raise UserError(
'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the message here to make it clear that they are likely looking for one of the 2 settings instead.

# Only certain types support cache_control
# See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached
cacheable_types = {'text', 'tool_use', 'server_tool_use', 'image', 'tool_result'}
last_param = cast(dict[str, Any], params[-1]) # Cast to dict for mutation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work without the cast?

ronakrm and others added 5 commits November 13, 2025 15:48
- Rename anthropic_cache_tools to anthropic_cache_tool_definitions for clarity
- Add backticks to docstrings for code identifiers (cache_control, tools)
- Improve error message to mention alternative cache settings
- Remove unnecessary cast for BetaToolUnionParam (line 441)
- Add explanatory comment for necessary cast of BetaContentBlockParam (line 703)
- Update Bedrock comment to link to issue pydantic#3418

All tests pass with these changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
These mock tool functions are never actually called during tests,
so their return statements don't need coverage.

Achieves 100% coverage for test_anthropic.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ronakrm ronakrm requested a review from DouweM November 14, 2025 01:17
ronakrm and others added 2 commits November 13, 2025 17:26
- Add proper mkdocs cross-reference links to anthropic_cache_instructions
- Add proper mkdocs cross-reference links to anthropic_cache_tool_definitions
- Link to model settings documentation section

Per maintainer feedback on PR pydantic#3363.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace three separate sections (each with individual examples) with:
- A concise bulleted list of the three caching methods
- One comprehensive example showing all three methods combined

This reduces repetition and makes the documentation more scannable.

Per maintainer feedback on PR pydantic#3363.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants